Skip to content

Fix/responses streaming#4

Merged
lucaromagnoli merged 4 commits into
mainfrom
fix/responses-streaming
Apr 14, 2026
Merged

Fix/responses streaming#4
lucaromagnoli merged 4 commits into
mainfrom
fix/responses-streaming

Conversation

@lucaromagnoli

Copy link
Copy Markdown
Contributor

No description provided.

lucaromagnoli and others added 4 commits April 14, 2026 15:47
Without this override the base class parser (which expects Chat
Completions SSE: choices[0].delta.content) silently yields empty
tokens for every line of a Responses API stream, leaving accumulated
text empty and returning success=false with no error string.

Responses API deltas arrive as:
  data: {"type":"response.output_text.delta","delta":"token"}
Default ProviderConfig.maxTokens from 4096 to 0 so providers that gate
on if (maxTok > 0) omit the cap entirely. Letting OpenAI use its own
default matters for GPT-5 Responses where reasoning tokens count
against max_output_tokens — 4096 was being exhausted mid-reasoning
and returning incomplete/truncated responses.

Anthropic still falls back to 4096 internally because its API requires
max_tokens.
Anthropic's Messages API rejects this field with HTTP 400 — it was an
OpenAI-ism. `reasoningEffort` is now silently ignored for Anthropic.
Extended thinking uses a separate `thinking` block on supporting models.
@lucaromagnoli lucaromagnoli merged commit 365d277 into main Apr 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant